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Abstract 

fSJ -We study the Bipartite Unconstrained 0-1 Quadratic Programming Problem (BQP) which is a relaxation of the Un- 
constrained 0-1 Quadratic Programming Problem. Applications of BQP include mining discrete patterns from binary 
data, approximating matrices by rank-one binary matrices, computing cut-norm of a matrix, and solving optimization 
problems such as maximum weight biclique, bipartite maximum weight cut, maximum weight induced subgraph of a 
bipartite graph, etc. We propose several classes of heuristic approaches to solve BQP and discuss a number of construc- 
tion algorithms, local search neighborhoods and their combinations. Results of extensive computational experiments 
are reported to establish the practical performance of our algorithms. For this purpose, we propose several sets of test 
instances based on different applications of BQP. 
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^ The Unconstrained 0-1 Quadratic Programming Prob- 
lem (QP) is to 

maximize f{x) — Q' x + c' x + Cq 
\ subject to a; e {0, 1}", 

'where Q' is an n x n matrix, c' is a row vector in M", and 
'Cq is a constant. QP is a well-studied proble m in the oper- 
ations research literature ( Billionnetl. |2004 |) . The focus of 
■this paper is on a problem closely related to the QP called 
the Bipartite Unconstrain ed 0- 1 Qua dratic Programming 
.Problem (BQP) (Punnen et al.l . l2012l ). The BQP can be 
'defined as follows: 



maximize /(x, y) — x^Qy - 



dy + co 



subject to x e {0, 1}", y e {0, 1}", 

where Q = [qij) is an m x n matrix, c = (ci, C2, . . . , Cm) 
is a row vector in R™, d = (di, d2, . . . , d„) is a row vector 
in M" and cq is a constant. Without loss of generality, 
we assume that m < n and cq = 0. In what follows, we 
denote a BQP instance built on matrix Q, row vectors c 
and d and cq = as BQP((5,c, d), and {x,y) is a feasible 
solution of the BQP if a; e {0, 1}" and y G {0, 1}". Also 
Xi stands for the ith component of the vector x and yj 
stands for the jth component of the vector y. 



"This research work was supported by an NSERC Discovery grant 
awarded to Abraham P. Punnen. 
'Corresponding author. 



By a simple transformation, th e BQP can be formu- 



lated as a QP of size m -\- n, see (jPunnen et al.l . I2012I ). 



Since any feasible solution of such a QP instance corre- 
sponds to a feasible solution of the original BQP, both 
exact and heuristic algorithms available to solve the QP 
can be used directly to solve the BQP. However, solving 
BQP instances by converting them into QP instances and 
then applying QP solvers is rather inefficient; indeed, the 
obtained QP instances are of larger size, and, further, QP 
algorithms cannot exploit the special structure of the prob- 
lem. Thus, in this paper we focus on heuristics designed 
specifically for the BQP. 

A graph the oretic interpretation of the BQP can be 



20121) . Let / = {1,2, 



given as follows () Punnen et al 
and J = {1,2,. ..,n}. Consider a bipartite graph G = 
(/, J, E). For each node i G I and j G J, respective costs 
Ci and dj are prescribed. Further, for each {i,j) G E, a. 
cost Qij is given. Then the Maximum Weight Induced Sub- 
graph Problem on G is to find a subgraph G' = (/', J', E') 
such that J2iei' '^i+J2jeJ' +J2(i,j)(^E' 1v maximized, 
where I' C I, J' C J and G" is induced by /' U J'. The 
Maximum Weight Induced Subgraph Problem on G is pre- 
cisely the BQP, where Qij = if (i, j) ^ E. 

There are some other well known combinatorial opti- 
mization problems that can be modelled as BQP. Consider 
the bipartite graph G = {I, J, E) with Wij being the weight 
of the edge {i,j) G E. Then the Ma ximum Wei ght Bi clique 
Problem (MWBP) (|Ambiihl et al.l . I2OIII: iTanl . boOSt) is to 
find a biclique in G of maximum total edge- weight. Define 



} 



q^j 



-M otherwise. 
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where M is a large positive number. Set c a nd d as zero 



vecto rs. Then BQP(Q, c, d) solves the MWBP (|Punnen et all 
2OI2I) . This immediately shows that the BQP is NP- 



hard and one can also establish some appro ximation hard 



ness r esult s with appropriate assumptions (jAmbiihl et al 



201 ll: iTanl . l2008h . The MWBP has applications in data 



mining, clustering and bioinformatics (jChang et al.l . 12012 



Tanav et al.l . 120021) which in turn become applications of 



BQP. 

Another application of BQP arise s in approximating 
a matrix by a rank-one binary matrix ( Gilhs and Glineun. 



201l| :lKovutiirk et al.l . [2005ll200fil:lLu et al.Ll201ll:IShen et al 



20091 ). For example, let H = {hij) be a given mxn matrix 



and we want to find an m x n matrix A — (uij ) , where a.i 
UiVj and Ui,Vj e {0, 1}, such that J^'iLi YTj^ii^i] 
is minimized. The matrix A is called a rank one approx- 
imation of H and can be identified by solving the BQP 



1 - 2hi 



and do 



for all i G / 



with qij — 
and j e J. 

topic in mining d i screte patterns in binary data (|Lu et al 
l201lUShen elaP . l2009l) . If Ui and Vj are required to be in 



Binary matrix factorization is an important 



{ — 1,1} then also the resulting factorization problem can 
be formulated as a BQP. 

The Maximum Cut Problem on a bip artite graph (Max- 



Cut) can be formulated as a BQP (|Punnen et al.l . 120121) 



and this gives yet another application of the model. BQP 
can also be used to find approxim ations to the cut-norm 



of a matrix (Alon and Naod 



gproximf 

JSooi. 



To the best of our knowledge, heuristic algorithms for 
the BQP were never investigated thoroughly in the the 
literature except some result on variations of block coor- 
dinate descent type algorithm applied to BQP((5, 0™, 0"), 
where O'^ is a zero vector in M*^ . We propose several classes 
of problem-specific heuristics and neighborhoods, provide 
efficient neighborhood exploration algorithms and discuss 
the performance of the obtained methods based on their 
theoretical properties and experimental results. 

The paper is organized as follows. In Section [51 we 
present several algorithms for quick construction of BQP 
solutions. In Section [31 we propose more advanced heuris- 
tic approaches to solve the problem. Our testbed for ex- 
perimental analysis is described in Section [31 The results 
of computational experiments are reported and discussed 
in Section [5l Finally, the concluding remarks are provided 
in Section [51 

2. Construction Heuristics 

Let us start from a general observation. Assume that 
for some i G / both > and qij > for all j E J. Then 
the value of Xi can be fixed to 1, and this will preserve 
the optimal solutions. If < and qij < for all j G J, 
then Xi can be fixed to 0. Similar results can be obtained 
for the Tjj variables. Thus, all the NP-hard instances have 
mixed positive and negative values of qij and/or c; and dj. 

Hence, the objective of a feasible solution may be either 
negative or positive. However, the optimal objective values 



are always non-negative since a trivial solution {x,y) = 
(0'",0"') achieves f{x,y) — for any problem instance. 
A trivial solution can be used as a starting point for an 
improvement heuristic. However, it is worth noting that 
this solution may turn out to be a deep local maximum 
for some local search neighborhoods and, thus, should be 
used carefully. 

In order to obtain several different starting points, one 
can use random solutions. A random solution {x,y) is 
obtained by choosing Xi and yj randomly for each i and 
j, respectively. Observe that the expected value E[/] of a 
random solution is 



E[/] = nimiq + mic + nid, 



(1) 



where ni and mi are the expected numbers of I's in y 
and X, respectively, and g, c and d are the averages of 
q, c and d. The values ni and mi can be calculated as 
ni = lEEjeJ^j] = n-PiVj = 1) and mi = ^J2zei^i\ = 
m-p{xi — 1). Thus, if g < 0, c < and d < 0, the expected 
objective value E[/] of a random solution will be negative. 
An attempt to improve such a solution with a simple local 
search will usually generate a trivial solution. 

Thus, a better approach is to use the following con- 
struction heuristic. Let positive{v) = v ii v > and 
positive{v) — otherwise. Let w^' — Ci+'^j^j positive{qij). 
Order the rows of the problem such that > w^_^^ for 
i = l,2,...,m — 1. On the ith iteration, choose the best 
value of Xi with the assumption that xi,X2t ■ ■ ,Xi-i are 
fixed, Xi+i = Xi+2 = ■ ■ ■ = Xm = and y is selected op- 
timally. Note that the latter can be done efficiently since 
an optimal value oi y — y{x) given a fixed x is as follows: 

{1 if j G J and qijXi + dj > 0, 
^eI (2) 
otherwise. 

We call this algorithm Greedy and say that a solution pro- 
duced by the Greedy algorithm is a Greedy solution. Our 
implementation of the Greedy heuristic (see Algorithm [H) 
terminates in 0{mn) time. 

Below, we theorems prove some properties of the Greedy 
algorithm. 

Remark 1. If m < 2, the Greedy algorithm produces an 
optimal solution. 

Proof. Let m = 1. Then the Greedy algorithm tests all 
possible values of x and for each of those finds the optimal 
y, hence, yielding an optimal solution. 

Let m = 2. Observe that the Greedy algorithm selects 
the best of solutions {x,y), where y G {0,1}" and x is 
xg{(0,0)^, (1,0)^, (1,1)^}. Observe also that if there 
exists an optimal solution {x,y) such that x — (0,1)^, 
then f{x, y) = and, hence, there exists another optimal 
solution (x', y') such that x = (1, 0)-^ and f{x', y') = = 
W2 ■ The result follows. □ 
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Algorithm 1: The Greedy algorithm hTiplementa- 
tion. 



1 Order the rows of the problem such that wj' > w. 
for z e / \ {m}; 

2 Sj <— dj for j e J; 

3 for « ^ 1 to m do 



/i ^ Cj + X]jg,7Posi<we(sj 
if /o > /i then ^ 0; 
else 

^ 1; 

^ Sj + for each j e J; 



10 for j 1 to n do 



11 

12 



if Sj > then ^ 1; 
else j/i 4- 0; 



Theorem 1. For m > 2, the Greedy algorithm provides a 
m-i -(iPPi~oximation of the optimal solution, and this bound 
is sharp. 

Proof. Let wf = Ci+J^j^j positive{qij) and the rows to be 



ordered such that > w^_^^ for i 



1,2,.. 



1. We 



assume that wj" > as otherwise the Greedy algorithm 
produces a trivial solution which is optimal. 

Let (x*, y*) be an optimal solution of the problem. As- 
sume x* = for some i E I. Then the optimal objective 
f{x*,y*) is at most • (to — 1). Thus, if {x, y) is a Greedy 
solution, 



f{x,y) 
f(x*,y*] 



> 



(m - 1) TO 



1 



Now assume that x* = 1 for each i E I. Let w* = 
^jej lijVj ■ Recall that, after two iterations, the Greedy 
algorithm produces an optimal solution for the problem 
consisting of the first two rows. Thus, w* + < f{x,y). 
At the same time, /(x, y) > wf > > w* for any i G / 
and, hence. 



f{x,y) 



> 



fix,y) 



1 



f{x*,y*) f{x,y) + {m -2)f{x,y) m - I ' 
To establish that the bound is sharp, consider the 

m — 1 ^' 

following example, where m = n and c and d are zero 
vectors: 



Q = 



1 —n —n 
-n 1 
-n 1 











Observe that (x, y) provides an optimal solution with ob- 
jective value /(x, y) = m — 1 if x = y = (0, 1, 1, ... , 1)-^. 
Indeed, if xi ~ 1, then 2/2 = 2/3 = ■ • • = y?i = and 



f{x,y) < 1. Since the Greedy algorithm fixes xi = 1 on 
its first iteration, it yields a solution with objective value 
1. The result follows. □ 

3. Improvement Algorithms 

The Greedy heuristic proposed above is usually not 
enough to produce solutions of sufficient quality. If that is 
the case, one can further improve the obtained solutions 
with some of the algorithms discussed in this section. 

3.1. Alternating Algorithm 

This approach is well-known as block coordinate de- 
scent algorithm in the non-linear optimization literature 
and was used used to solve the BQP((5, 0™, 0") problem. 
Further, we provide a brief definition of the algorithm for 
the BQP((5, c, d) and discuss the ways to improve its per- 
formance. 

Observe that, analogous to given a fixed y, one can 
efficiently compute the optimal x = x{y) using 

{1 a i E I and Qijyi -t- Q > 0, 
j£J (3) 
otherwise. 

The Alternating Algorithm is a simple heuristic that al- 
ternatively fixes X and y to improve y and x, respectively, 
until a local optimum is reached, see Algorithmic] 

Algorithm 2: A naive implementation of the Alter- 
nating local search. 

1 A ^ 1; 

2 loop 
3 
4 



if y{x) ^ y then 
|_ y ^ y(x); 

else if A = then break; 

A 0; 

if y{x) ^ y then 
\_y^ y(x); 
else break; 



It takes 0(nm) time for the Alternating algorithm to 
explore the neighborhood Aait(x",y°) = {(x, y") : x G 
{0,l}'"}u{(x°,2/) : t/G{0, 1}"}. It is easy to verify that 
|iVait(x°,yO)| = 2" + 2"-l. 

Theorem 2. // a solution (x, y) is a local maximum in 
Nait{x,y), then f{x,y) > 0. 

Proof. Consider a trivial solution (0™, 0"). Observe that 
(0™, 0") e N^itix, y) for any solution (x, y). Since /(O™, 0") 
for any BQP instance, the objective of the best solution 
in Na,it{x,y) is at least 0. □ 
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Our contribution is an improvement of the Alternating 
algorithm in terms of average running time. We noticed 
that, in practice, the number of iterations of the Alter- 
nating heuristic may be noticeable but, after a couple of 
iterations, only a few variables Xi and yj are getting up- 
dated. By maintaining two arrays, namely s and w, one 
can reduce the time needed to calculate x{y) and y{x) to 
0{m) are 0{n) time, respectively, see Algorithm |31 Up- 
dating the arrays s and w, though, takes 0{mn) time 
in the worst case but it is significantly faster if only a 
small number of values of Xi or yj are modified. Thus, 
the improved implementation provides significantly better 
average performance while preserving the worst case time 
complexity. Observe also that such an implementation is 
much more friendly than Algori thm[2]with regards to CPU 



cache ( Karaoetvan et al. . l2009l) 



Algorithm 3: An efficient implementation of the Al- 
ternating local search. 

1 Sj ^ dj + Qij^i foi" each j G J; 



3 X i 1; 

4 while A < do 



5 
6 
7 
8 
9 
10 
11 
12 

13 
14 
15 
16 
17 
18 
19 
20 
21 



A ^ A + 1; 

for j ^ 1 to n do 

if yj = and > then 



1; A 0; 

Wi + qij for each i G J; 



else if yj = 1 and Sj < then 

^ 0; A ^ 0; 
Wi 4— Wi — qij for each i £ I; 



if A = 1 then break; 

A ^ 1; 

for i <^ 1 to m do 

if = and Wi > then 

Xi ^ 1; X ^ 0; 



Sj Sj + Qij for each j G J; 
else if = 1 and < then 

Xi ^ 0; A ^ 0; 



Sj — Qij for each j G J; 



3.2. Portions-Based Algorithms 

Another approach to improve a BQP solution is to fix 
several variables Xi and to solve the resulting constrained 
problem. Let BQP{Q, c, d, I*, x'^) be defined as follows: 

maximize x^ Qy + cx + dy 
subject to Xi = x^ for i ^ I* , 

xG{0,l}™,yG{0,ir. 

Observe that BQP((3, c, d, /*, x") can be reduced to 
BQP {Q*,c*,d*), where Q* = {qij) is a /c x n matrix, c 



is a vector in R*^, c? is a vector in M" and k — \I*\. Indeed, 
let /* = {7r(l),7r(2), . . . ,7r(fc)} and, for i = 1, 2, . . . , fc, let 



97r(i 



T(^) 



and d* 



Let {x*,y*) be an optimal solution of BQP(Q*, c*, d*), 
where Q* = {q*j) is an |/*| x n matrix. Define 



if i 7r(r) for some r, 
otherwise 



and 



y = y 



Then (x,?/) is an optimal solution to BQP(Q, c, d, /*, x"). 

The corresponding neighborhood Nj, (x° ,y^) = { (x, y) : 
X G {0, 1}™, y G {0, 1}", Xi = x° for i ^ /*} is of size 
jA'/. (x°, y°)| = 2^^ 1+" and can be explored in 0{mn + 
n2l^ I) time as it takes 0{mn) time to produce a reduced 
problem BQP((?*, c*, d*) and 0{n2^') time to solve it. Ob- 
serve that, if |/*| is O(logm), the complexity of the first 
stage of the algorithm may dominate that of the second 
stage. When several neighborhoods Nj-. (x, y) are to be 
explored, the total time complexity of the first stage can 
be reduced. In particular, by precalculating Sj — dj + 
X^ig/ lij^i 8.nd further maintaining these values, one can 
explore k neighborhoods Ni* (x, y) in 0{mn+Kn2^^ ') time. 
Hereafter, we assume this technique to be used in all im- 
plementations of the derived heuristics. 

We now propose several improvement algorithms ex- 
ploiting this approach. Consider Exhaustive Portions neigh- 
borhood defined as follows: 



N^.i=c,y) = 



U Ni.ix,y). 



I*Cl,\I'\=k 

Observe that the intersection Nj. (x, y)riNj*» (x, y), where 
/*, /** C /, may be significant. To avoid re-exploring can- 
didate solutions, we start from enumerating all the subsets 
/* C / of size 1, then we enumerate all the subsets I* C I 
of size 2, etc. For each subset, we only need to test one 
value of X = X* , where 



1 — Xi if i G /*, 
Xi otherwise. 



(4) 



For details see Algorithm |4l 

With the above algorithm, the Exhaustive Portions 
neighborhood is of size \N^^{x,y)\ = 2" • J^''^^^ {"^) and 
can be explored in 0{(^)n) C 0{m!'n) time. Thus, for a 
reasonably large instance, it is normally possible to use the 
Exhaustive Portions local search only with a very small k. 
When fc = 1, the algorithm becomes a simple local search 
that we call Flip heuristic. The corresponding neighbor- 
hood A^flip(a:°,y°) = {(a;°,y°)} U {(x,y) : x G {0,1}'", 
y G {0, 1}" and Y^kh = l} is of size |Aflip(x,y)| = 

m2" -t- 1 and can be explored in 0{mn) time. Observe 
that A'fiip(x,y) is larger than Na\t{x,y) although the time 
complexity of the corresponding exploration algorithms is 



4 



Algorithm 4: An efficient implementation of the Ex- 
haustive Portions local search. 

1 A ^ 1; 

2 while A = 1 do 



Algorithm 5: The VND heuristic. 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 

13 

14 



A ^ 
for p ■ 



1 to fc do 

0; 

while n < {'^) do 

Select next /* C /, |/*| = p; 
Calculate a;* according to (H)); 
if f{x*,y{x*)) > f(x,y{x)) then 
a; -s— x*; 
/i ^ 0; 

if p > 1 then A 1; 
else /i ^ A* + 1; 
if A = 1 then break; 



the same. This may indicate superior quality of the Flip 
heuristic. 

Note that Affiip(x°, ?/°) is not exactly equal to Nl^{x^ ^ ip) 
Observe that most of the BQP heuristics yield solutions 
such that y is already optimal for the given x. Thus, in our 
implementation of the Flip heuristic, we avoid exploration 
of solutions that are different from the given one only in 

y- 

Another strategy is to consider only a limited number 
of subsets /* £ /, each of a larger size k. The resulting 
algorithm is called Random Portions improvement heuris- 
tic which proceeds as follows. Let fc, 1 < fc < m, be a 
parameter of the algorithm. Randomly select a set I* C I 
such that |/*| = k. Replace the solution {x,y) with the 
best solution in Nj* (a;, y) and proceed to the next random 
set /*. Terminate when either the prescribed number of 
iterations is completed or the given time limit is reached. 
The time complexity of the Random Portions algorithm is 
0{mn+Kn2^), where k is the number of iterations, and the 



size of the explored neighborhood 
by |<;^t(^",2/°)l<^2^-+"- 



portv''' ■ 



y 



is bounded 



3.3. Variable Neighborhood Descent 

Given sufficient k, the Random Portions heuristic may 
be effective in removing a solution from a deep local maxi- 
mum but it may be unnecessary slow when applied to low 
quality solutions. Thus, a good strategy is to start from a 
fast heuristic and, when it reaches a local maximum, apply 
the Random Portions algorithm for some k. If it succeeds 
in removing the solution from the local maximum, go back 
to the fast heuristic. Otherwise proceed to the Random 
Portions algorithm with a larger value of k. Such an algo- 
rithm can be viewed as an imple mentation of the Variable 
Neigh borhood Descent (VND) ( Hansen and Mladenovid 
20031) . and we implemented it as discussed in Algorithm [S] 



1 Produce a solution (x, y) with the Greedy algorithm; 

2 A ^ 1; 

3 while A = 1 do 



9 
10 

11 

12 
13 
14 



A ^ 0; 

Improve the solution {x,y) with the Alternating 
local search; 

Improve the solution {x,y) with the Flip local 
search; 

if the solution was improved by the Flip local 
search then 
L A^l; 

else 

for fc ^ 2 to p do 
for ^ <^ 1 to m do 

Select a random I* C I such that 
|/*| = fc and £ e /*; 
Fix Xi, i ^ I* and solve the remaining 
problem exactly; 
If the solution was improved, set 
A ^ 1; 



15 return the objective of {x,y) and {x,y)] 



In our VND, the initial solution is constructed with 
the Greedy heuristic. Then it is repeatedly improved with 
Alternating and Flip heuristics until a local maximum in 
Nait{x,y) U Nfiip{x,y) is reached. Next, the Random Por- 
tions (fc — 2) local search is applied. If it does not succeed, 
the Random Portions {k — 3) local search is applied, etc. 
If at some point the solution is improved, the algorithm 
restarts with Alternating and Flip heuristics. 

The VND neighborhood AvND(a;,y) is defined as fol- 
lows: 

k 

NvND{x,y) = iVait(a;,y)UiVflip(a;,y)u|J A^;^(x,2/) . (5) 

p=2 

Let us calculate the intersections of the components of ([5]) . 
Observe that the intersection of the first two neighbor- 
hoods is smaU: 7Vait(x", 2/°) n Nfiipix°,y°) = {{x°,y°)}U 



{{x,y°) : x 
iVait(xO,2/0 

,0 



n iV/.(xO,2/0) = 



••i — x^\ = lY In contrast, 
{{x\y) : y e {0,1}"} U 
XG{0,1}", x,^x°,ioi-t^r},i.e.,N^-";,{x,y) 
includes a large part of Nait{x, y). Moreover, A^Bip(x, y) C 
Npl^^{x, y) if the Random Portions heuristic is implemented 
as in Algorithm [5] 

Let us now estimate the size of N^^^Kx, y)riNp^f.{x, y). 
In particular, we will show that the probability of choosing 
a set /* of size p < k such that Nj* {x, y) C iVpo™(a:, y) is 
low. Observe that 



PiA cB)^ — 
m 



k k-1 



k — p + I kl ■ (m — p) ! 



TO 



1 



TO 



P+l 



{k-p)\ 



5 



ii A,B e / are selected randomly, \A\ — p and \B\ = k. 
Thus, the probability that N^' (x, y) C N^oTti^, y) if |/*| = 
p is 

k\ ■ (m — p)! ~ 



To solve BQP((5, c, d, CT), compute 



m 



I . 



(fc-p)! 



(6) 

Note that, in Algorithm [SJ the sets /* are not exactly 
arbitrary but for simplicity we can ignore it. For small k 
and p, 2 < p < k, we have 

hm P{N''{x,y)cN^-Tt{x,y)) 



lim 1 
lim 1 



cxp 



fc! 1 

(fc — p)! mP 
k\ 



0. 



(fc -p)!mP-i 

Even if TO = 100, A: = 3 and p = 2, the probability ^ 
is below 6%, i.e., only a small number of neighborhoods 
Ni*{x,y) dominated by N^'^^{x^y) are selected within 
Np^t(x,y). 

Thus, the neighborhood A^/ndI^:, y) is significantly larger 
than any of its components which indicates the potential 
of the VND heuristic to yield superior solutions. Although 
the quick heuristics Alternating and Flip are mostly dom- 
inated by following algorithms, it is expected that they 
speed up the VND method. 

Another variation of VND that we call VND Exhaus- 
tive alternates between the Alternating heuristic and the 
Exhaustive Portions algorithm. As for the Flip heuristic, 
our implementation of VND Exhaustive does not explore 
solutions : y G {0,1}"} within the Exhaustive 

Portions local search. 

3.4- Multi-Start Algorithms 

Multi-Start method is a simple yet efficient technique 
that have bee n applied to many combinatorial optimiza- 
tion problems (|Martil 120031 ). There exist several variations 
of the metaheuristic; the one that we used is as follows: 
(a) produce a random solution; (b) improve it; (c) save the 
obtained solution if it is better than the best one found so 
far; (d) terminate if the given time has elapsed or proceed 
to the next random solution otherwise. 

We have three fast local search algorithms, namely Al- 
ternating, Flip and VND Exhaustive [k = 1). Thus, we 
consider three Multi-Start heuristics. 

3.5. Row-Merge Algorithms 

Another approach to reduce the size of the problem is 
to partition its rows into clusters and consider all the rows 
in each cluster merged together. Let 3 = [P.P, ...,/*''} 
be a partition of I into clusters , , . . . , . Introduce 
a problem BQP(Q, c, d, 3): 

maximize x^Qy -\- cx -\- dy (7) 
subject to Xi = X( for i,£ & C, C G CJ, (8) 

xe{0,ir,ye{0,ir. (9) 



E 



r£l' 



and d* = d 



for i G {1, 2, . . . , k}. Let {x* ,y*) be an optimal solution of 
BQP {Q*,c*,d*), where Q* = (g,* ) is a fc x n matrix. Let 
Xr — X* for each r e P and z e {1, 2, . . . , k}. Then (x, y*) 
is an optimal solution to BQP((5, c, d, 3). 

We call this method Clustering Row-Merge heuristic. 
Its quality depends significantly on the partition 3. In 
particular, it is expected that rows i and i are 'merged' 
only if Xi = xe in most of near-optimal solutions of the 
problem. We propose the following partitioning technique. 

Let S'\ S^,...,SP be p 'good solution' to BQP(Q, c, d). 
Create a complete weighted graph G with a node set /. 
Let the weight wu of an edge (z, i) in G be the number of 
solutions {x, y) G {S^.S"^, . . . , S^} such that Xi — xg. For 
a set of nodes G C /, let 

min Wu if \G\ > 1, 
p otherwise 



and 



w(C) = \G\■^l{G). 



(10) 



Define the Partitioning Problem be the problem of finding 
a partition 3 — {1^,1^, . . . , I*^} of a given size k such that 

^(3) = J2'i=i w{l'') is maximized. Since the weight w{P) 
of a cluster P is proportional to the weight of the lightest 
edge in the clique induced by /*, the elements of such a 
partition will tend to have no light edges. It is important 
to note that w{G) is also proportional to the cardinality 
of G and, hence, it is preferable to have the number of 
clusters with light edges as small as possible. 

Theorem 3. The Partitioning Problem is strongly NP- 
hard even if k — 2. 

Proof. We reduce the maximum cliqu problem to the par- 
titioning problem. Let G = (/, E) be a given graph. Let 
G' = {P ,E') be a complete weighted graph with node set 
I' = I U {v}. Let Wii — p ii {i,£) £ E and wu = other- 
wise, where wu is a weight assigned to the edge {i, £) £ E. 
Observe that in such a graph, w{G) — \G\p if C C /' is a 
clique and w[G) — otherwise (see fTU)) . 

Let k = 2 and 3 = {P, P} be an optimal partition of 
/'. For simplicity, assume v £ P. Then there exist the 
following options: 

1. If |/| = 1, the problem is trivial, and the maximum 
clique in G is /. 

2. If w{3) — p, the graph G is an independent set. 
Indeed, if there exists an edge (z, £) £ E, then setting 
P = {i, £} and P = P\P would result in w{3) = 2p. 
If |/| = 1, the partition P = {v} and P = I provides 
a solution with objective value 2p. 
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3. If w(3) = \I'\p, then G is complete. Indeed, if 
|/^| > 1, then is not a chque and, thus, w{I^) = 0. 
Provided that w{P) < \P\p < \I'\p, we get = {v} 
and w{I^) = p. Therefore, w{P) = \I\p and, thus, 
7^ = / is a clique in G, i.e., G is complete. 

4. If p < w{3) < \I'\p, then P is a maximum clique in 
G. Indeed, if is not a clique in G, then 'w{P) = 
and mil) < p since w{I^) < p. Thus, P is a clique 
and |7^| > 1 resulting in w{I^) = 0. Observe that 
w{3) = \P\p and, therefore, P is maximized. The 
result follows. 

5. Note that > p for any graph G because the 
weight of a partition 7^ = {v} and 7^ = 7 is at least 
P- 

□ 

Since the Partitioning Problem is NP-hard, we use a 
heuristic approach to solve it. Consider the Greedy Parti- 
tioning Algorithm that proceeds as follows. Create a par- 
tition 3 = {{!}, {2}, . . . , {to}}. In each iteration, choose 
a pair P,Q eJ such that ■w{3 U {P U Q} \ {P, Q}) is max- 
imized and update 3 -s- 3 U {P U Q} \ {f", Q}- Repeat this 
until PI = k. 

The algorithm performs m — k iterations. In each iter- 
ation, there are pairs of elements of 3 to be compared. 
It takes O(m^) time to calculate the weight of a merged 
cluster. Hence, the time complexity of the Greedy Parti- 
tioning Algorithm is O(m^), which is unreasonable even 
for moderate instances. The following is a modification of 
the algorithm that terminates in 0(m? log to) time. 

For P, Q C 7, let S{P, Q) = w(P U Q) - w{P) - w{Q) 
and niP, Q) = n{P U Q). Set n{P) = p for each P in the 
initial partition 3 = {{1}, {2}, . . . , {to}}. Calculate the 
values /u(P, Q) for each P ^ Q £ 3 and place all these 
pairs in an ordered set L. Sort L by S{P, Q) in descending 
order. Let (P, Q) be the first element in L. Merge P and 
Q by updating 3 ^ 3 U (P U Q) \ {P, Q}. Remove (P, Q) 
from L. For each P e 0^ \ {P U Q}, also remove (P, P) 
and (P, Q) from L and insert (P, P U Q) such that the 
ordering of L is preserved. Note that ii{P U Q) = A*(P, Q) 
and ^J,{R, PUQ) can be calculated in 0(1) time as 

/x(P, P U Q) = min {/i(P, g), ^(P, P), Q)} • 

Repeat the procedure until \3\ = k. The first iteration 
of the algorithm takes 0(m^ log m) time. Then, on each 
iteration, the algorithm spends O(TOlogTO) time to update 
L and the values of ^. Since the number of iterations is m— 
k, the total complexity of the algorithm is 0{rn^ log to). 

By exploiting the fact that p is usually a small fixed 
number, we are able to further speed up the Greedy Parti- 
tioning Algorithm. Observe that —pm < S{P, Q) < 0. For 
each w = —pm, ~pm + 1, . . . , 0, introduce a list that 
contains all the pairs (P, Q) such that S{P, Q) = w, where 
P and Q are in the initial partition. Creating such lists 



takes only 0(m^) time. On every iteration, select an ar- 
bitrary pair of cliques (P, Q) £ aw, where w is the largest 
index such that 7^ 0. Since the number of lists a^j is 
0(to), this operation takes only 0(m) time. The updating 
procedure is similar to the one described above and also 
takes 0{m) time. Thus, the complexity of the algorithm 
is 0{m?). We used this implementation in the Clustering 
Row-Merge algorithm. 

Observe that both implementations of the Greedy Par- 
titioning Algorithm require quick operations with pairs 
{P,Q). In our implementation, we maintain a hash ta- 
ble for each P € 3. Such a hash table stores all the pairs 
{P,Q), where Q £ 3. As the hash key, we use the first 
vertex Qi in Q. Since the clusters are non-intersecting, 
such a key is unique and limited by 1 < Qi < m. 

The Clustering Row- Merge heuristic effectively exploits 
information obtained from several near-optimal solutions 
of the problem and, thus, may be useful in evolutionary 
algorithms and other metaheuristics. However, if such so- 
lutions axe unavailable, the Clustering Row-Merge algo- 
rithm cannot be used. In order to apply the idea of row 
merging in a standalone heuristic, we propose the following 
method. Generate a random partition 3 = . . . ,1'^} 

of 7 such that P ^ % for each i. Solve BQP((5, c, d, 3) as 
described above and improve the resulting solution with a 
fast local search. If the obtained solution is better than 
the best known so far, save it. Repeat the procedure until 
the given time is elapsed. We call this method Multi-Start 
Row-Merge. 

The Multi-Start Row-Merge algorithm can be modified 
to be used as an improvement procedure. Let (.t, y) be the 
starting solution. Generate a random partition 3{x) of 7 of 
size k such that Xi = xe for each x,£ & C, where C G 3{x). 
A solution to BQP{Q,c,d,3{x)) is the best solution in 
the neighborhood A^merge(a;°, 2/°) = {{x,y) : x G {0,1}", 
Xi = xiiii,l € C, where C G 3{x°), and y e {0, 1}"}. 
The size of this neighborhood is \N^e,-gc{x" ,y^)\ = 2*^+" 
and it takes 0{n2^) time to explore it. Observe that 
{x,y) G Nnicigc{x,y) and, thus, the corresponding local 
search never worsens the solution. We call this improve- 
ment procedure Row-Merge Local Search. 

Assume (x, y) G A^^mergo , j/°) is an improvement over 
J/"). Then x is different from x° in at least |C| values, 
where C G 3{x°). Intuitively, it is unlikely that such an 
improvement is possible if {x'^,y°) is a near-optimal solu- 
tion and the cluster C is large. Thus, this local search 
can be efficient only if k is large. Hence, it is impractical 
to solve BQP{Q,c,d,3{x)) exactly. Note, however, that a 
heuristic solution to BQP((5, c, d, 3{x)) may be worse than 
{x°,y°) and, thus, the Row-Merge Local Search algorithm 
needs to evaluate a solution before accepting it. 

4. Testbed 

There is no standard testbed available in the; literature 
for BQP. Thus, we have created a testbed consisting of five 
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instance types that correspond to some real applications 
of the problem. 

In order to generate some of the instances, we will need 
random bipartite graphs. To generate a random bipartite 
graph G = [V, U, E), we define seven parameters, namely 
m = \V\, n — |C/|, di, di, d^, d2 and ji such that Q < di < 
di < n, < d2 < d2 < m, mdi < nd2 and mdi > nd^. 

The bipartite graph generator proceeds as follows. 

1. For each node v € V , set d^ as a uniformly dis- 
tributed random integer in the range d^ < dy < di. 

2. For each node u e C/, set d^ as a uniformly dis- 
tributed random integer in the range d2 < d^ < d2. 

3. While J2vGv'^'' ^ Sugc/^"' alternatively select a 
node in y or [/ and regenerate its degree as described 
above Q 

4. Create a bipartite graph G = {V, [/, E), where E = 

5. Select a node v such that dy > degv (if no such 
node exists, go to the next step). Let U' — {u ^ U : 
degu < du and ^ E}. If U' ^ 0, select a node 
u G U' randomly. Otherwise randomly select a node 
u ^ U such that {v,u) ^ E and dy > 0; randomly 
select a node v' e V adjacent to u and delete the 
edge {v',u). Add an edge {v,u). Repeat this step. 

6. For each edge {v,u) G E, the weight Wyu is a nor- 
mally distributed integer {a — 100 and fi is given). 

The following are the instance types used in our com- 
putational experiments. 



1. The Random instances are as follows: 



and d-i 



are normally distributed random integers with fi — 
and cr = 100. 

2. The Max Biclique instances model the problem of 
finding a biclique of the maximum weight in a bipar- 
tite graph. Let G = (I, J, E) be a random bipartite 
graph with d-^ = n/5, di = d^ — m/5, d2 = m 
and fj, = 100. Note that setting ^ to would make 
the weight of any large biclique likely to be around 
and, thus, the landscape of the problem would be 
rather flat. If Wij is the weight of an edge («, j) G E, 



set q^j 



for every i e / and j £ J if G E 



and Qij = —M otherwise, where M is large number. 
Set c = 0™ and d = 0". 



In practice, if m{d^ +d.\) ^ n{d2 +d2), this algorithm converges 
very quickly. However, in theory it may not terminate in finite time. 
Thus, one can update it as foUovi^s. If the equality ^^y(zY — 
SuGC/ '1°* achieved after a certain number of iterations, start 

a fixing procedure as follows. Let 5 = 1 if J^ugv'^" < Eugt?^" 
and S = —1 otherwise. While "^^^y dy ^ Eugc/"^" md^ < 
^ + ^vev — "^'^1 1 select randomly v G V such that < dy + 5 < 
di and update dy <— dy + S. While "^Zy^v '^^ SIusu'^"' ^^^sct 
randomly u G U such that ^2 < du—S < d2 and update du i— du—S- 



3. The Max Induced Subgraph instances model the prob- 
lem of finding a subset of nodes in a bipartite graph 
that maximizes the total weight of the resulting in- 
duced subgraph. The Max Induced Subgraph in- 
stances are similar to the Max Biclique instances ex- 
cept that Qij = if (i, j) ^ E and ^ = 0. The lat- 
ter change is needed because having too many edges 
with positive weights would make the problem very 
simple (indeed, including all or almost all the nodes 
would yield a good solution). 

4. The MaxCut instances model the MaxCut problem 
as follows. First, we generate a random bipartite 
graph as for the Max Induced Subgraph instances. 



Then, we set qij = —2wij if (i,j) G E and 



= Oif 

^ E. Finally, we set Ci — ^ X^j pj lij and dj = 
For an explanation, see ([Punnen et al 



3 ^i gJ ' 

20121) . 



5. The Matrix Factorization instances model the prob- 
lem of producing a rank one approximation of a bi- 
nary matrix. The original matrix H = (hij) (see 
Section [Ij is generated randomly with probability 



0.5 of h,. 



1 . The values of qij are then calculated 



as qij = 1 — 2hij, and c — 0™ and d — 0". 

The proposed instances are hard to solve exactly and, 
thus, we are unaware of the optimal objective values of any 
of the moderate and large instances used in our computa- 
tional experiments. Thus, in our experiments, we use the 
best known objective values to evaluate the quality of the 
obtained solutions. All our test instances and best known 



solutions can be found at http : //www . sf u . ca/~dkarapet/ 



5. Empirical Evaluation 

In this section, we provide empirical analysis of the 
algorithms discussed above. Our experiments were con- 
ducted on an Intel 17-2600 CPU based PC. All the algo- 
rithms are implemented in C=ff 4.0, and no concurrency is 
used. 

Where appropriate, we denote the algorithms as fol- 
lows. 'Rn', 'G', 'A' and 'F' stand for Random (a proce- 
dure that generates a random solution) , Greedy, Alternat- 
ing and Flip algorithms, respectively. We use parentheses 
to indicate the initial solution for a local search. For exam- 
ple, 'A(G)' denotes the Alternating local search applied to 
a Greedy solution. 'P^', 'V^' and V^'^ stand for Random 
Portions, VND and VND Exhaustive algorithms, respec- 
tively. Multi- Start metaheuristic is denoted by 'M', where 
the local search procedure is specified in parentheses. The 
Clustering Row-Merge, Multi-Start Row-Merge and Row- 
Merge Local Search algorithms are denoted by 'Rfe', 'R™' 
and 'Rfc ', respectively. 

5.1. Exact Algorithms 

Several of the heuristic techniques described above re- 
quire an exact algorithm to solve a reduced problem. In 
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this study, we tested two algorithms. We cah the first al- 
gorithm Exhaustive Enumeration. Since calculation of the 
optimal y for a fixed x can be done efficiently, we only need 
to try all possible x, i.e., 2™ values. Thus, with a care- 
ful implementation, this algorithm terminates in 0(ri2™) 
time. The second algorithm uses a mixed integer program- 
ming solver applied to the following problem: 



maximize 



subject to 



EE 

iel j£J 



E 

i6/ 



e {0,1} 
< % < 1 

< z,j < 1 



for i £ I and j G J, 
for i e / and j G J, 
for i E I and j G J, 
for i G /, 
for j G J, 

for i G I and j G J. 



We will refer to this as MIP approach. 

As a mixed integer programming solver, we used CPLEX 
12.4. An important parameter of CPLEX is the branch 
and bound strategy of the solver. We found that the fea- 
sibility emphasis usually provides the best performance 
in our experiments. For the other CPLEX parameters, 
we used the default values except that we disabled mul- 
tithreading. To produce starting solutions for the MIP 
approach, we used the Greedy algorithm followed by the 
Alternating local search. Giving a starting solution is 
not necessary for CPLEX but we found that providing 
it speeds up the optimization. 

The results of our experiments with the exact algo- 
rithms are reported in Table [TJ The performance of the 
Exhaustive Enumeration method practically does not de- 
pend on the instance type; on our computational platform, 
the running time of the Exhaustive Enumeration algorithm 
can be roughly approximated as 



t, 



exhaustive 



= 10"" • n2" sec. 



(11) 



In contrast, the behaviour of the MIP approach signifi- 
cantly depends on the size and the type of the problem, see 
Table [TJ Observe that the MIP approach usually outper- 
forms the Exhaustive Enumeration algorithm, especially 
for larger instances. This result, however, holds only for 
small values of n. Indeed, the performance of the MIP 
approach algorithm dramatically depends on the number 
of columns while the performance of the Exhaustive Enu- 
meration algorithm is linear in n, see Figure [1] Thus, 
the Exhaustive Enumeration algorithm is clearly the best 
choice for the instances with large n. Since the reduced 
problems arising within Random Portions, VND and Row- 
Merge heuristics preserve the number of columns of the 
original instance, we used the Exhaustive Enumeration al- 
gorithm in our implementations when an exact solution 
was needed. 

Another interesting observation is that the integer pro- 
gramming formulation cannot be used to obtain a good 



upper bound for the BQP. The LP relaxation of the prob- 
lem provides an objective that is normally several times 
larger than that of the optimal solution. One can also use 
a mixed integer program solver to find an upper bound 
by giving it a limited time. We noticed, however, that, in 
case of the BQP, the gap between the lower and the up- 
per bounds is usually large until the very last steps of the 
algorithm, see Figure [51 

It is interesting to test the heuristic algorithms on the 
small instances. Our experiments showed that the per- 
formance of the fastest algorithms (Greedy, Alternating, 
Flip and VND Exhaustive (fc = 1)) is relatively good, on 
average, but those algorithms rarely reach the optimality. 
Increasing the value of k in the VND Exhaustive heuris- 
tic significantly improves the solution quality. However, 
even when k = 7, VND Exhaustive was not able to obtain 
optimal solutions for certain instances while the running 
time of the heuristic was roughly 10 minutes for each of 
the instances of size 50 x 50. 

It turns out that one of the most efficient approaches 
for the small instances is the Multi-Start heuristic used 
with a fast local search. Even being given only 10 itera- 
tions, M(V°'^) produced optimal solutions for all the Ran- 
dom instanceslj With 100 iterations, M(V5;'') reaches opti- 
mality for all the Random, Max Induced Subgraph, Max- 
Cut and Matrix Factorization instances. By further in- 
creasing the number of iterations, we were able to improve 
the quality of M(V°'') solutions for the Max Biclique prob- 
lems though, even for 100 000 iterations, the algorithm did 
not reach the optimal solution for the Max Biclique prob- 
lem of size 50 X 50. Note, however, that for most of the 
test instances, M(V5'') (100 000 iterations) terminated in 
less than 10 seconds. Moreover, given only 1000 iterations, 
M(V5'') performs as good as if it is given 100 000 iterations. 

The Row-Merge heuristics also work well for the small 
instances. For example, R20, where the reduced problem is 
solved with Vj^ and that is given as much time as V4'^ takes 
to solve the corresponding instance (i.e., less than 1 second 
for most of the instances) , yielded optimal solutions for all 
the test problems. 

5.2. Fast Heuristics 

We first analyse the performance of our fastest heuris- 
tics: Random, Greedy, Alternating, Flip and VND Ex- 
haustive (fc ~ 1). The results of the experiments for the 
Random instances can be found in Table [5] For each of 
the heuristics, the table reports the relative gap between 
the solution obtained and the best known solution, and 
the running time. The objective gap is calculated as 



gapif) 



/i 



best 



/ 



100% , 



/ best 



(12) 



^In fact, optimal solutions to some of the test problem were not 
available to us (see Table in which cases we used the best known 
solutions to evaluate the heuristics. 



9 



TTl X Tl 


fi nrl nm 


Max Rirl 


Max Tnrl 


MaxCiit, 


Matr Fart, 


1 j 1 1 tit mo Lf 1 V v_' 1 J 1 1 • 


ZU X ou 


9 9 


u.z 


U.o 


A 9 


9*^ 9 
Zo.Z 


u.o 


25 X 50 


64.9 


0.4 


3.1 


55.3 


419.9 
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62.4 
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4820.1 


475.5 


35 X 50 


10956.4 


0.8 


45.8 


3413.6 
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40 X 50 




4.2 


124.4 






w 5.5 ■ 10^ 


45 X 50 




7.4 


691.3 






w 1.8 ■ 10^ 


50 X 50 




35.7 


2 658.6 






« 5.6 ■ 10^ 



Table 1: Evaluation of exact algorithms. For each instance, the running time, in seconds, is reported for the MIP approach (if it does not 
exceed 5 hours). The last column reports the running time of the Exhaustive Enumeration algorithm, in seconds, for the Random instances 
of the given size (for the instances of size m > 40, the reported times are obtained according to ((TT}). The running time of the MIP approach 
includes the time spent on generating starting solutions. 




The number of columns n 

Figure 1: The performance of the MIP approach significantly depends on the number of columns n while the running time of the Exhaustive 
algorithm is linear in n. The experiment was conducted for Max Biclique instances, m = 20, n = 20, 40, . . . , 500. 



where / is the objective value of the solution obtained 
and /best is the objective of the best known solution for 
that test instance (note that for any practical instance 
/best > 0). Where uncertainty is present (e.g., in 'Rn' or 
'A(Rn)'), the experiment is repeated 10 times, and the av- 
erage value is reported. The last row of the table reports 
the average objective gap and the average running time for 
each heuristic. These values, however, should be consid- 
ered with caution as, e.g., the running times of a heuristic 
may vary dramatically for different instances. 

Recall that the expected objective of a random solution 
can be calculated according to Since the expectation 
for all the weights in a Random instance is 0, the objec- 
tives of all the resulting solutions are also close to and 
the objective gaps are close to 100%, see This result 
also holds for the Max Induced Subgraph, MaxCut and 
Matrix Factorization problems. However, it is different for 
a Max Biclique instance for which the expected objective 
value is a large negative number (recall that, in a Max 
Biclique problem, qij = — M if there is no edge between i 
and j in the original bipartite graph). Thus, the objective 
gaps of the random solutions for Max Biclique are very 
large, and the Alternating algorithm quickly converges to 
near-zero solutions (recall that, according to Theorem [51 
the solution improved with the Alternating local search 
can never have negative objective). It shows that the Al- 
ternating algorithm normally should not be applied to so- 



lutions with negative objective values as the Alternating 
local search tends to fall into a trivial local maximum of 
{x,y) = (0™, 0") in such cases. Also note that, among the 
quick heuristics, V5'^(Rn) shows the best performance for 
Max Biclique problems and F(Rn) performs poorly. 

Other than that, the performance of the fast heuristics 
has the same pattern for the other instance types. The 
Greedy algorithm is a good option to obtain a reasonable 
solution in a very short time; it is the fastest heuristic in 
most of the experiments. The Alternating and Flip algo- 
rithms are yet two fast and efficient local searches; the Al- 
ternating heuristic performs faster but the Flip algorithm 
yields better solutions. The solution quality of Vf^ is sim- 
ilar to that of the Flip heuristic though Vf^ is notably 
slower. However, the neighborhood of Vf^ is clearly larger 
than that of the Flip local search and, thus, being applied 
to a near-optimal solution, V^'^ is expected to demonstrate 
superior performance. 

Whether we start a local search from a random or a 
Greedy solution, the performance of the local searches re- 
mains approximately the same. However, we noticed that 
starting from the Greedy algorithm yields, on average, 
slightly better solutions and, thus, in what follows, we use 
Greedy solutions as the starting ones by default. 
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Figure 2; The progress of the MIP algorithm for a Random instance of size 35 X 50. The gap between the proven upper bound and the 
optimal solution remains significant until the very last steps of the algorithm. The starting solution is only 0.68% away from the optimal one, 
and the optimal solution is obtained after 966 seconds. 
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1.2 


0.8 


OA 


0.4 


0.5 


0.4 


186.7 


354.3 


992.0 


2670.9 


3000 X 5000 


101.3 


3.6 


0.9 


0.7 


0.4 


OA 


0.5 


0.3 


278.7 


508.9 


2544.9 


4530.5 


4000 X 5000 


100.5 


3.9 


0.7 


0.7 


0.3 


OA 


0.5 


02 


369.4 


685.3 


1557.7 


6301.4 


5000 X 5000 


99.5 


4.2 


0.8 


0.7 


06 


0.6 


0.6 


0.7 


460.7 


1009.4 


4761.4 


1492.0 


Average 


100.6 


3.5 


1.4 


1.0 


05 


0.5 


0.6 


0.5 


142.9 


279.2 


1050.2 


1620.8 



Table 2: Evaluation of fast heuristics on the Random instances. The running times of the local searches started from random solutions are 
close to those started from Greedy solutions. The time needed to generate a random solution is negligible and, thus, is also skipped. 



5.3. Portions-Based and Multi-Start Heuristics 

In Table[3l we compare Portions-based and Multi-Start 
heuristics, which include the VND Exhaustive (V^^), VND 
(Vfc), Random Portions (P^) and Muhi-Start (M(A), M(F) 
and M(V^'')) algorithms. For a fair competition, we give 
each of the Random Portions and Multi-Start algorithms 
the same amount of time as the VND (fc — 6) takes to solve 
the corresponding instance. We also studied performance 
of the heuristics being given smaller and larger times but 
this did not yield any new observations. 

The most successful heuristic in this series of exper- 
iments is Multi-Start based on either Flip or V^'^ local 
search. Usually, M(F) dominates M(Vi'^); indeed, accord- 
ing to Table [U Flip is significantly faster than V^'' and, 
thus, Multi-Start is able to complete more iterations when 
using the Flip improvement procedure. However, for the 
Max Biclique instances, M(F) performs very poorly as a 
result of F(Rn) low performance (see Section 15.21) and, 
therefore, M(V°'') is preferable for the instances like Max 
Biclique. 

It is worth noting that for the Matrix Factorization in- 
stances, the Multi-Start algorithms are sometimes domi- 



nated by the P3 and/or P4 heuristics although, on average, 
M(F) shows slightly better performance; a possible reason 
for that is discussed in Section r5.4l In several experiments, 
M(F) and M(Vf^) were also outperformed by but at 
the cost of unreasonable running times. We suggest to use 
V^'^ with fc > 1 only for relatively small instances. 

According to our experiments, the optimal value of k 
for the Random Portions heuristic is fc = 4 (sometimes 
fc = 3). We conducted the tests for a wide range of k, 
given times and instances, and P^ was the winner in most 
such experiments. We believe, however, that, being given 
better starting solutions, the Random Portions heuristic 
would benefit from a larger value of k. 

Performance of the VND (fc = 6) is slightly worse, on 
average, than that of P4, and this tendency holds for all 
the instance types. In our experiments, we also tried a 
modification of the VND heuristic that restarts immedi- 
ately after finding an improvement. Considering that the 
Pfe algorithm is usually significantly slower than the Al- 
ternating and Flip local searches, this could speed up the 
heuristic. However, our experiments showed that such a 
change did not improve the performance of the VND. 
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Gap to the best known, % 



Time, sec 



m X n 








M(A) 


M(F) MiVf) 




P4 




Ps 








900 V 1 000 


1 


n 1 

U. lU 


1 1 

U. ± 1 


QA 


n 00 


03 






Oil 


1 zL 





1 





400 X 1000 


0.52 


0.52 


0.52 


0.61 


0.12 


0.09 


0.41 


0.40 


0.40 


0.63 


(10 


2.3 


0.5 


600 X 1000 


0.79 


0.57 


0.73 


0.32 


0.12 


0.15 


0.71 


0.63 


0.70 


0.68 


OJ. 


15.0 


0.8 


800 X 1000 


0.75 


0.60 


0.72 


0.43 


0.22 


0.27 


0.51 


0.50 


0.57 


0.55 


OA 


30.4 


1.8 


1000 X 1000 


1.08 


0.90 


1.07 


0.41 


0.28 


0.32 


0.46 


0.49 


0.52 


0.56 


OA 


47.2 


2.1 


1000 X 5000 


0.20 


0.19 


0.20 


1.08 


0.14 


0.15 


0.27 


0.29 


0.26 


0.28 


OJ 


220.0 


10.4 


2000 X 5000 


0.40 


0.40 


0.40 


0.74 


0.21 


0.23 


0.35 


0.35 


0.34 


0.36 


2J 


998.7 


19.6 


3000 X 5000 


0.33 


0.33 


0.33 


0.48 


0.15 


0.21 


0.29 


0.28 


0.34 


0.25 


45 


554.1 


34.4 


4000 X 5000 


0.24 


0.20 


0.24 


0.38 


0.19 


0.25 


0.37 


0.36 


0.37 


0.36 




5 256.9 


42.9 


5000 X 5000 


0.73 


0.38 


0.73 


0.45 


0.33 


0.32 


0.48 


0.56 


0.50 


0.48 


L5 


27688.1 


46.2 


Average 


0.52 


0.42 


0.51 


0.58 


0.18 


0.20 


0.40 


0.40 


0.41 


0.43 


M 


3 481.4 


15.9 



Table 3: Evaluation of slow heuristics on the Random instances. The running times of M(A), M(F), M(V°''), P3, P4, Pg and Pg are equal 
to that of Ve . 



5.4- Row-Merge Heuristics 

Unlike all other algorithms discussed above, the Clus- 
tering Row-Merge heuristic requires a number of good so- 
lutions to perform row clusterization. In an evolutionary 
algorithm, one can, e.g., use a subset of the last genera- 
tion for this purpose. In iterative heuristics, one can use 
the best p distinct solutions obtained so far. In our ex- 
periments, we generate synthetic solutions by producing 
p — 100 random solutions which are then further improved 
with VND Exhaustive {k — 1). Note that we do not in- 
clude the time spent on generating these solutions in the 
running time of the Clustering Row-Merge heuristic. 

Recall that the Clustering Row-Merge algorithm solves 
a constrained version of the BQP. Thus, even if the ob- 
tained solution is optimal with respect to the BQP((3, c, d, 3), 
it is unlikely that it remains a local maximum for either 
of the neighborhoods defined above after relaxation of the 
constraints (|5]). In our experiments, we apply V^'^ to every 
solution obtained by the Clustering Row-Merge heuristic. 

As it was mentioned in Section [XSl the Row-Merge Lo- 
cal Search cannot be efficient if k is small. In particular, it 
is necessary to keep the clusters in 3 small. Thus, we use 
k = [m/2j, k = [to/3J, etc. The resulting reduced BQP 
instances are, therefore, too large to be solved exactly. We 
use F(G) to obtain a solution {x* , y*) of such an instance 
and then apply the Flip local search to improve the re- 
stored solution (x, y). Same algorithms are used within the 
Multi-Start Row- Merge heuristic as it also performs better 
for small clusters (indeed, if a random cluster is large, then 
there is a high chance that neither assigning nor 1 to the 
corresponding variable x^. is reasonable). Note that these 
decisions were made after extensive experimentation with 
different construction and improvement procedures. 

The results of computational experiments for the most 
successful variations of the Row-Merge heuristics are re- 
ported in Table |H The Clustering Row-Merge heuristic 
shows outstanding performance for the Random instances 
and performs similarly to M(F) and M(V°'') for the other 



instances. Note, however, that generation of initial so- 
lutions (the corresponding procedure is denoted as Mioo 
in Table HJ for the Clustering Row-Merge heuristic often 
takes more time than running the Clustering Row-Merge 
algorithm itself and produces solutions superior to the so- 
lutions obtained by the Clustering Row-Merge heuristic. 
Our experiments with different values of p showed that 
reduction on the number of initial solutions worsens the 
quality of the Clustering Row-Merge method. Hence, the 
Clustering Row-Merge algorithm is impractical as a stan- 
dalone heuristic. We also tried to use a heuristic solver for 
the BQP((5, c, d, 3) problems in order to increase the value 
of k. We noticed that the better is the heuristic solver and 
the larger is k, the higher is the quality of the resulting 
solutions. However, the best performance was observed 
for the exact algorithm for BQP(Q, c, d, 3) and fc = 15 or 
fc = 20. It is also worth mentioning that we expect that 
using better initial solutions for the Clustering Row-Merge 
heuristic would improve its performance. 

The Multi-Start Row- Merge and Row-Merge Local Search 
algorithms demonstrate very good performance for cer- 
tain instances. In particular, the Multi-Start Row-Merge 
and Row-Merge Local Search heuristics outperform, on 
average, all other heuristics for the Biclique and Matrix 
Factorization instances, respectively. Indeed, iterative im- 
provements may not be efficient for the Biclique problems 
characterized by deep local maxima. The Multi-Start ap- 
proach utilized in the Multi-Start Row-Merge algorithm is 
preferable in such a case. In contrast, the landscape of the 
Matrix Factorization instances is rather flat and iterative 
improvements work well for it. These results hold even if 
we allow more or less time to each of the heuristics. 

6. Conclusion 

In this work, we considered an important combinato- 
rial optimization problem called Bipartite Unconstrained 
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Gap to the best known, % 



Time, sec 



Instance 




Mioo 


M(F) 


M(Vr) 


Rl5 


R20 


pm 

«'Lm/2j 




Tils 

«'Lm/2j 




Mioo 


Rl5 


R20 




0.11 


0.00 


0.00 


0.03 


0.08 


0.03 


0.01 


0.00 


0.11 


0.3 


1.2 


0.3 


9.1 


400 X 1000 


0.52 


0.06 


0.12 


0.09 


0.15 


0.14 


0.13 


0.11 


0.27 


0.5 


2.5 


M 


9.1 


600 X 1000 


0.73 


0.10 


0.12 


0.15 


0.22 


0.22 


0.12 


0.18 


0.40 


0.8 


3.4 


CL5 


9.2 


800 X 1000 


0.72 


0.21 


0.22 


0.27 


0.26 


0.24 


0.19 


0.16 


0.27 


1.8 


4.6 


M 


9.3 


1000 X 1000 


1.07 


0.22 


0.28 


0.32 


0.28 


0.23 


0.34 


0.30 


0.27 


2.1 


6.2 


M 


9.5 


1000 X 5000 


0.20 


0.08 


0.14 


0.15 


0.12 


0.10 


0.14 


0.12 


0.17 


10.4 


52.6 




74.1 


2000 X 5000 


0.40 


0.16 


0.21 


0.23 


0.19 


0.23 


0.21 


0.21 


0.21 


19.6 


131.4 


6^ 


100.8 


3000 X 5000 


0.33 


0.14 


0.15 


0.21 


0.12 


0.10 


0.22 


0.18 


0.18 


34.4 


215.3 


12.8 


92.7 


4000 X 5000 


0.24 


0.15 


0.19 


0.25 


0.13 


0.10 


0.20 


0.19 


0.23 


42.9 


280.3 


29.6 


109.7 


5000 X 5000 


0.73 


0.25 


0.33 


0.32 


0.09 


0.11 


0.35 


0.32 


0.38 


46.2 


337.3 


34.7 


128.0 


Average 


0.51 


0.14 


0.18 


0.20 


0.16 


0.15 


0.19 


0.18 


0.25 


15.9 


103.5 


M 


55.1 



Table 4: Evaluation of the Row-Merge heuristics on the Random instances. The running time of M(F), M(V™), R-|^/2J ' "'^'[m/aj ^'^'^ '^[r 
is fixed to that of Ve- The algorithm Mioo denotes M(Vj'') terminated after 100 iterations. 



0-1 Programming Problem. We proposed several heuris- 
tic approaches to construct and improve BQP solutions. 
Several fast algorithms such as the Greedy construction 
heuristic and the Flip local search may be a reasonable 
choice if the solution quality requirements are low. 

The success of slower techniques significantly depends 
on the instance type. In particular, we noticed that sim- 
ple Multi-Start heuristics perform very well for the Ran- 
dom instances. For the instances with rather flat land- 
scapes, iterative improvement methods work better. For 
the problems having some 'structure' (i.e., not purely ran- 
dom instances), the Row-Merge heuristics, being used with 
small clusters, provide superior solution quality comparing 
to simple Multi-Start algorithms. Thus, we believe, this 
technique is of use in real applications. 

The Clustering Row-Merge method also performs well 
for many problems though it can only be used within some 
metaheuristics. A more sophisticated VND algorithm did 
not work well in our experiments. 

In addition to the heuristics, we compared two sim- 
ple exact algorithms, namely the MIP approach and the 
Exhaustive algorithm. The former is generally faster for 
'square' instances, i.e., when n k, m, while the latter is 
significantly better when n ^ m. 

In order to reduce the size of the manuscript, we ex- 
cluded some of the tables from this paper. The full version 
of our experimental results, test instances and best known 
solutions are available at http : //www, sf u . ca/~dkarapet/[ 
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